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Abstract 

Entropic dynamics, a program that aims at deriving the laws of physics 
from standard probabilistic and entropic rules for processing information, 
is developed further. We calculate the probability for an arbitrary path 
followed by a system as it moves from given initial to final states. For an 
appropriately chosen configuration space the path of maximum probability 
reproduces Newtonian dynamics. 



1 Introduction 

It is not unusual to hear that science consists in using information about the 
world for the purpose of predicting, modeling, and/or controlling phenomena 
of interest. If this vague image turns out to be even remotely accurate then 
we expect that the laws of science should reflect, at least to some extent, the 
methods for manipulating information. Here we wish to entertain a far more 
radical hypothesis: perhaps the laws of physics are nothing but rules of inference. 
In this view the laws of physics are not laws of nature but are merely the rules 
we follow when processing the information that happens to be relevant to the 
physical problem at hand. The evidence supporting this notion is already quite 
considerable: most of the formal structure of statistical mechanics pQ and of 
quantum theory (see e.g. [5]) can be derived as examples of inference. 

The basic difficulty is that the available information is usually incomplete 
and one must learn to handle uncertainty. This requires addressing three prob- 
lems; the first two have been satisfactorily solved, the third one has not. First, 
one must represent one's partial state of knowledge as a web of interconnected 
beliefs with no internal inconsistencies; the tools to do it are probabilities [31 0]. 
Second, when new information becomes available the beliefs must be correspond- 
ingly updated. The instrument for updating is relative entropy and the resulting 
procedure — the ME method — is the only candidate that can claim universal ap- 
plicability. The ME method is based on the recognition that prior information 
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is valuable and should not be revised except when demanded by new evidence; 
it can handle arbitrary priors and arbitrary constraints; it includes MaxEnt and 
Bayes' rule as special cases; and it provides a quantitative assessment of the 
extent that distributions that deviate from the entropy maximum are ruled out. 
(See e.g. 01].) 

The third problem is trickier. When we say that "the laws of physics are not 
laws of nature" we do not mean that physics can be derived without any input 
from nature; quite the opposite. The statement "physics is inference" comes 
with considerable fine print. It implicitly assumes that one is doing inference 
about the "right things" on the basis of the "right information." The third and 
so far unsolved problem is that of identifying the questions that are interesting 
and the information that is relevant about a particular physical situation — 
this is where the connection to nature lies. The current approaches cannot be 
called a method — ultimately there is no scientific "method." We have learned 
from experience — a euphemism for trial and error, mostly error — which pieces 
of information happen to work well in each specific situation. Recent results, 
however, in model selection [T and in the development of a quantitative theory 
of inquiry and of relevance [8j represent considerable progress and point the way 
towards more systematic approaches. 

In any case, once the relevant information has been identified, if the laws of 
physics are merely rules of inference, then we should be able to derive them. Our 
main concern is to derive laws of dynamics and the challenge — of course — is to 
avoid assuming the very laws of motion that we set out to derive. The formalism, 
which we refer to as entropic dynamics [9] [10] , is of general applicability but to 
be specific we focus on the example of particle mechanics. 

In a previous paper we derived Newtonian mechanics without assuming 
a principle of least action, or concepts of force, or momentum, or mass, and 
not even the notion of an absolute Newtonian time. None of these familiar 
concepts are part of the input to the theory — they are all derived. As described 
in [TT] the crucial step was the selection of a suitable statistical model for the 
configuration space of a system of particles which amounts to specifying both 
the subject matter and the relevant background information. 

The objective of the present paper is to develop the formalism of entropic 
dynamics further. We address the same dynamically interesting question: Given 
an initial and a final state, what trajectory will the system follow? In [11] we had 
calculated the path of maximum probability and we showed that it corresponds 
to Newtonian dynamics. But the available information does not single out a 
unique path; here — and this is our main result — we calculate the probability 
for any arbitrary path between the given initial and final states. As a first 
application we verify that indeed the most probable path reproduces our earlier 
result. A more detailed study of fluctuations and diffusion about the Newtonian 
path will, however, be left for a future publication. 

We conclude with brief remarks about the asymmetry between past and 
future as seen from the unfamiliar perspective of entropic dynamics, and about 
a possible connection between this work and Nelson's derivation of quantum 
mechanics as a peculiar kind of diffusion process [12] , 
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2 Physical space and configuration space 



Consider one particle (or many) living in our familiar "physical" space (whatever 
this might ultimately mean) . There is a useful distinction to be drawn between 
this physical space y and the space of states or configuration space X. For 
simplicity we will assume that physical space y is flat and three dimensional; 
its geometry is given by the Euclidean metric ds 2 = S a bdy a dy b — generalizations 
are straightforward. The configuration space X for a single particle will also 
be assumed to be three dimensional but it need not be flat. The interesting 
dynamics will arise from its curvature. The main additional ingredient is that 
there is an irreducible uncertainty in the location of the particle. Thus, when we 
say that the particle is at the point x € X what we mean is that its "physical" , 
"true" position y G y is somewhere in the vicinity of x. This leads us to 
associate a probability distribution p(y\x) to each point x and the space X is 
thus transformed into a statistical manifold: a point x is not a structureless dot 
but a fuzzy probability distribution. The origin of these uncertainties is, at this 
point, left unspecified. 

In [TT] we adopted a Gaussian model, 



p(y\ x > = j^jw ex p 



l lab (x)(y a - x a )(y» - x b ) 



(1) 



where 7 = det7 a fc. It incorporates the physically relevant information of an 
estimate of the particle position, {y a } = x a , and of its small uncertainty as 
given by the covariance matrix, 

r b = ((y a - x a )(y b - x b )) , (2) 

which is the inverse of -f a b, l ah lbc = $c- The choice of Gaussians is physically 
plausible but not strictly necessary. We are trying to predict behavior at macro- 
scales in terms of assumptions we make (that is, information we assume) about 
what is going on at some intermediate meso-scales which themselves are the 
result of happenings at still shorter micro-scales about which we know absolutely 
nothing. If the fuzzincss in position that we postulate at the meso-scale is the 
result of many unknown microscopic influences going on at a much smaller 
micro-scale then general arguments such as the central limit theorem lead us 
to expect Gaussians as the plausible mesoscopic distributions for a very wide 
variety of microscopic conditions. 

To conclude the specification of the model we further impose that the Gaus- 
sians be spherically symmetric^ with a small but non-uniform variance a 2 {x) 
conveniently expressed in terms of a small constant a 1 modulated by a (positive) 
scalar field 

lab{x) = 2 , , ab = —d ab ■ (3) 

a (x) 

The next feature is automatic, it requires no further assumptions: the con- 
figuration space X, when viewed as a statistical manifold, inherits a geometry 
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from the distributions p(y\x). The distance between two neighboring distri- 
butions p(y\x) and p{y\x + dx) is the unique measure of the extent that one 
distribution can be statistically distinguished from the other — distinguishability 
is distance (and possibly vice- versa, but that is a story for another paper [TP]). 
It is given by the information metric of Fisher and Rao [131 [5] , 

,„2 j a j b f A , i ,d\o%p{y\x) d\ogp{y\x) 
di =g ab dxdx with g ab = J dyp{y\x) — — b . (4) 

The corresponding volume element is dv — g 1 ^ 2 (x)d 3 x where g = det g ab - Sub- 
stituting ((T|) and Q into ([4]) we obtain the information metric for the manifold 
of spherically symmetric Gaussians, 

1 

g a b(x) = 2 (S ab + 6d a ad b a) « — —S ab = lab{x) , (5) 
a (x) <7 

provided ctq is sufficiently small. 



3 The probability of a path 

The fact that the state x of the particle might be unknown is described by 
a distribution P(x). The path of a particle is an ordered sequence of N + 1 
positions {xq . . . x n}. We want to calculate the probability 

P{x\ . . . XN-l\xQXN)dx\ . . . dxN-i (6) 

that the path passes through small volume elements dx n at the intermediate 
points x\ . . . xjv— i- Since 

P{x x ...X N - X \xqx n ) = — : — r , (7) 

P(x N \x a ) 

our immediate interest will be to assign a probability P(x\ . . .xn\xq) of the 
ordered path {xi . . . x^} starting at Xq. 

Note that an external time has not been introduced. It is true that the path 
is ordered so that along a given path the point x n reached after n steps could be 
construed to occur later than the point reached at the previous step, x n -\. But 
most important elements implicit in the notion of time are conspicuously absent. 
For example, we still have no way to order temporally the point x n reached along 
one path with the point x' n , reached along a different path. Statements to the 
effect that one occurs earlier or later or simultaneously with the other are, at this 
point, completely meaningless. We have not introduced a notion of simultaneity 
and therefore we do not have a notion of an instant of time. Furthermore we do 
not have a notion of duration either; we have not introduced a way to compare 
or measure intervals for the successive steps. The statement that a certain step 
took, say, twice as long as the previous step is, at this point, meaningless. 

Without the notions of instant or of interval we do not have time. An 
important part of the program of deriving physics from inference consists in 
understanding how and where these temporal concepts arise. 
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3.1 The single-step probability 

To warm up we first calculate the probability P(xi\xq) to reach X\ in a single 
step. Since we are ignorant not only about the true position 2/1 but also about 
the configuration space position x\ the relevant distribution of interest is the 
joint distribution Pj{x\yi\xo). We shall choose the distribution Pj (x iyi\xo) 
using the ME method, that is, by maximizing the single-step entropy [6] 

SAPj.P'j] = -jdx x d yi Pj(x m \x ) log Pj , ^\ x °\ . ( 8 ) 

Pj{xiyi\x a ) 

The prior: P'j(xiyi\xo) represents partial knowledge about the variables x\ 
and yi before we incorporate any information in the form of constraints. Let 

P'j{x iyi \x Q ) = P , ix 1 \x Q )P , (yi\xox 1 ) , (9) 

and focus first on P'(xi|xo). At this point it is not yet known how x\ is related 
to xq or to yi. We do know that x% S X labels some probability distribution 
p{y\x\) in X, eqs.([I] [3j, but we do not yet know that it is the distribution of 2/1, 
p(yi\xi). Thus, we are maximally ignorant about x\ and, accordingly, we choose 
a uniform distribution P'(xi|xo) oc g x l 2 (x\). For the second factor P' '(yi\XoX\) 
we argue that the variables 2/1 are meant to represent the actual (uncertain) 
coordinates of a particle; we assume that in the absence of any information to 
the contrary the distribution of 2/1 remains unchanged from the previous step, 
P '{yi\XQXi) = p{yi\xo). Thus, the joint prior P' is 

Pj(xiyi\x ) oc g 1/2 (x 1 )p(y 1 \x ) . (10) 

The constraint: Next we incorporate the piece of information that establishes 
the relation between x\ and y\. This is the constraint that demands updating 
from the prior to the posterior. The posterior Pj(x\y\\x§) belongs to the family 
of distributions 

Pj(xiyi\x ) = P(xi\x ) P(y i\xix ) (11) 

where P(xi\xo) is arbitrary and the second factor is constrained to be of the 
form P(yi\x xi) = p(yi\xx)a 

Substituting (p~0|) and (fTl) into ([8]) and rearranging gives 



S^Pj^'j] =-fdx 1 P(x 1 \x ) 

where 



9 1 (xi) 



(12) 



s(x u x ) = -Jd yiP (yM 4^r4 ■ ( 13 ) 

To determine P(xi\xq) maximize cq. (|12[) subject to normalization. The first 
term in ea. (fT2")) makes x\ as random as possible; by itself it would lead to a 
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uniform distribution P{x\\xq) oc g 1 ^ 2 {x\). The second term in eq. ([T!?)) brings 
X\ as close as possible to xq] it would make P{x\\xq) oc 8{x\ — Xq) and push 
S(x\,xq) towards its maximum value, S(xq,xo) = 0. The compromise between 
these two opposing tendencies is 

Pi(xi\x ) = J^9 1/2 (^i)e S{xi ' Xo) , (W) 

where z(xo) is an appropriate normalization constant. 

The probability (fl4|) represents a discontinuous jump from xq to x\ in a 
single step. No information has been imposed to the effect that the particle 
"moves" from xq to x\ along a continuous trajectory. This is done next by 
assuming that the continuous trajectory can be approximated by a sequence of 
N steps where N is large. 



3.2 The iV-step probability 

To assign the probability P{x\ . . . xn\%o) for & path we focus on the joint dis- 
tribution Pj(x\yi . . . xnUnIxq) and choose the distribution that maximizes 

N p 

S N [Pj, P'j] = -J( n dx n dy n ) Pj log -± . (15) 
n=l "j 

The prior: To assign Pj(xiyi . . . xnVn\xo) consider the path in the spaces of 
xs and ys separately (x £ X and y € y). 

Pj(xiyi . . . XNyN\xo) = P'{x\ . . . XN\xa)P'{yi . . . yN\xoXi . . . xn) ■ (16) 

The first factor is the prior probability of a path in the space X N . To the extent 
that we know nothing about the relation between successive xs we choose a 
uniform distribution in the space of paths, 

N 

P'(xi...x N \x ) oc Y[g 1/2 {x n ). (17) 

n=l 

The second factor is the prior probability of a path in the space y N . We assume 
that in the absence of any information to the contrary the distribution of the 
n-th step y n retains memory only of the immediately preceding a; n _i, 

N 

P'{Vi ■■■Vn\xqx x ...x n ) = l\p(y n \xn-i) ■ (18) 

n=l 

(This is not quite a Markov process; the distribution of y n does not retain 
memory of the immediately preceding y n ~\ G 3^, only of x n _i G X .) 
The constraint: Next we impose the information that relates y n to its corre- 
sponding x n . The posterior 

Pj(xiyi . . .x N y N \x ) = P(xi . ..x N \x )P(yi ■ ..y N \x xi . . .x N ) (19) 
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is constrained to belong to the family of distributions such that 

N 

P(yi ■ ■ .Vn\xoXi . . .x N ) = l\p(y n \x„) . (20) 

n=l 

Substituting Pj and Pj into ea. (|15p and rearranging gives 

Sn[Pj,P!j] = -/( ft dx n ) P( Xl . . . x N \x ) log ^-^^ 

71=1 

N N 
+ /( II dx n) P{Xl ■ ■ ■ X N \X ) J2 S(x n ,X n -i) (21) 
n—1 n—1 

where 

S(x n ,x n -i) = - fdy n p(y n \ x n ) log p y ™ Xn (22) 

p(y n \x n -i) 

As before the two integrals in ea. (f2"Tj) represent opposing tendencies. The first 
integral seeks to make P(x± . . . xn\xo) as random as possible with x n completely 
uncorrelated to x n -i. The second integral introduces strong correlations; it 
brings close as possible to the preceding i n _i. 

The main result: Varying P{x\ . . . xn\xq) to maximize ea. (l2~lT) subject to 
normalization gives the probability density for a path starting at the initial 
position Xo, 

±N N 

P N (xi ■ ■ -x N \x ) = —, — r[ n 9 1 (x n )] exp[ S(x n , X n -i)] (23) 

A{Xo) n =l n=l 

where Z(xq) is the appropriate normalization constant. 

The probability density for the iV-step path between given initial and final 
positions xq and xn is given by where Pn(xn\xo) is obtained from (|23j) . 

„1/2( T „) N-l , N 

Pn(x n \x ) = 9 J f J[ II ^^(a^expIES^,^-!)] . (24) 

^(.Xoj n=l n=l 

Substituting back into (|7|) gives the desired answer 
PAr(a:i . . .x N ^i\x x N ) = — -[ J] 5 1/2 (^n)] exp[ J] 5(x„, x n -i)] , (25) 

where Z(xq,xn) is the appropriate normalization. Equations (|23p and (|25j) are 
the main results of this paper. 



4 The most probable path 

We restrict our analysis of eq. (I25|) to calculating the most probable path from 
the initial position xq to the final xn. For fixed volume elements, dV n = 
g 1 ^ 2 {x n )dx n = dV , the path of maximum probability is that which maximizes 

N 

A(xi . . .xn-^xqxn) = J2 S(x n ,x n -i) , (26) 

n=l 
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where xq and xn are fixed. The maximum probability path is the polygonal 
path that brings the successive xs as "close" to each other as possible. For large 
N we expect this to be the shortest path between the given end points and, as 
shown below, this is indeed the case. The variation of A is 

N-l Q N 

5A = Eo^IE^iVlK 

N-l Q 

= E -yT [S{x n ,x n -i) + S(x n +i,x„)] 6x„ . (27) 

n—1 ™ X n 

For large N we assume that successive xs along the path are sufficiently close 
together that we can approximate 

S(x n ,X n —l) = — 2*^71,71-1 = — ~^9ab{, x n-l){ x n ~ x n-l)( x n ~ X n-l) • (28) 

Next, introduce a parameter A along the trajectory, x — x(X). The correspond- 
ing velocities x are 



def Xn-\-\ x n . def X n X n — \ . def X n -\-\ X n — \ 

x n+l/2 = T"T ) x n-l/2 = — and X r , 



(29) 



AA ' AA " 2AA 

Expand H 

gac{x n -l) = gac(x n ) - 9acA x n) x t-l/2^ + ■ ■ ■ , (30) 

and rearrange to get 

N-l i 

n=l z 

~a _ 'a 

/ N X 7i-l/2 X 7i+l/2 / \-d -a l /„i 

+ 9ac(x n ) ^ ffac,d(a:nX-l/2<-l/2j > ( 31 ) 

where we recognize the acceleration 

.. def in+1/2 — x n-l/2 _ X n+ i — 2x n + X n ^\ 



AA (AA) 2 

Substituting gives 

Vr„ ^« , . f,a , Q AA\ ^ „ d AA 



(32) 



71=1 



X v 



^9ab,c + 2 J ^ ^™ 2 ^ 



2 J \~ n " 2 
]fe„ . (••!:!) 



If the distribution of points along the trajectory is sufficiently dense, AA — » 0, 
the leading term is 

N-l i 

5 A = AA 2 J2 [9ac( x n) x n + r (Sca,& + 5c6,a ~ 5ab,c) x n x n}$ X l > ( 34 ) 

71=1 ^ 



? We use the standard notation g ac ( j = dg a c/dx d 
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or 

N-l 

5 A = AX 2 E g ad (x n )K + Tl c x h n x c n ]5x d n , (35) 

n=l 

where Y° ah are Christoffel symbols, 

T lb = (9da,b + 9db,a ~ gab,d) • (36) 

Setting 8 A = for arbitrary variations 5x^ leads to the geodesic equation, 

%n + ^bc^n^n = , (37) 

Incidentally, A turns out to be an affine parameter, that is, up to an unimportant 
scale factor it measures the length along the path, 

dX 2 = Cg ab dx a dx b . (38) 

Conclusion: The most probable continuous path between two given end points 
is the geodesic that joins them. 

Remark: It is interesting that although the ME inference implicitly assumed 
a directionality from the initial xq to the final xn through the prior for y n 
which establishes a connection with the "previous" instant, p(y n \x n -i), in the 
continuum limit the sense of direction is lost. The most probable trajectory is 
fully reversible. 

The treatment above is general; it is valid for dynamics on any statistical 
manifold. Now we restrict ourselves to the manifold of spherically symmetric 
Gaussians defined by {U [3]) . The parametrization in terms of A is convenient 
but completely arbitrary. Let us instead introduce a new non-afnne "time" 
parameter t — t(X) defined by 

dX def 1 , dx a dx b 

dt = - .„ or T t = — 7>8 ab — ■ — = $ (39) 

2V2$ 2cr 2 dt dt y ' 

then the geodesic equation becomes 

d 2 x c dx a dx b _ d 2 t/dX 2 dx c 



dt 2 ab dt dt (dt/dX) 2 dt 



(40) 



and using (J36 



we get 



Kb = (d a $5 c b + d b $5 c a - dMab) , (41) 

d 2 x a 9 Q $1, dx b dx c 

^bc-jr-j- , (42) 



dt 2 $ 2 ° c dt dt ' 
which, using T t — $ from eg. ([3"9"]l. gives 



w = da * i = ^ ■ (43) 
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This is Newton's equation. To make it explicit just change notation and call 



— d = m and d = E - V{x) (44) 

where E is a constant. The result is Newton's F = ma and energy conservation, 
d 2 x a dV(x) . m r dx a da; b _,, . _ 

Conclusion: We have reproduced the results obtained in [TT] . The Newtonian 
mass m and force F a — —d a V are "explained" in terms of position uncertain- 
ties; the uniform uncertainty <tq explains mass, while the modulating field $(x) 
explains forces. 

The extension to more particles interacting among themselves is straightforward- 
see Further analysis will be pursued elsewhere. Here we only mention that 
a most remarkable feature of the time t selected according to is that isolated 
subsystems all keep the same common time which confirms t as the universal 
Newtonian time. Thus, the advantage of the Newtonian time goes beyond the 
fact that it simplifies the equations of motion. It is the only choice of time such 
that isolated clocks will keep synchronized. 



5 Final remarks 

We conclude with two comments. The first concerns the arrow of time, an 
interesting puzzle that has plagued physics ever since Boltzmann |14j . The 
problem is that the laws of physics are symmetric under time reversal — forget, 
for the moment, the tiny T violations in K- meson decay — but everything else 
in nature seems to indicate a clear asymmetry between the past and the future. 
How can we derive an arrow of time from underlying laws of nature that are 
symmetric? The short answer is: we can't. 

In a few brief lines we cannot do full justice to this problem but we can 
hint that entropic dynamics offers a promising new way to address it. We note, 
first, that entropic dynamics does not assume any underlying laws of nature — 
whether they be symmetric or not. And second, that information about the past 
is treated differently from information about the future. Entropic dynamics 
does not attempt to explain the asymmetry between past and future. The 
asymmetry is accepted as prior information. It is the known but unproven 
truth that provides the foundation from which all sorts of other inferences will 
be derived. From the point of view of entropic dynamics the problem is not to 
explain the arrow of time, but rather to explain the reversibility of the laws of 
physics. And in this endeavor entropic dynamics succeeds. Laws of physics such 
as F = ma were derived to be time reversible despite the fact that the entropic 
argument clearly stipulates an arrow of time. More generally, we showed that 
the probability of any continuous path is independent of the direction in which 
it is traversed. (Incidentally, if the paths were not continuous but composed of 
small discrete steps then the predictions would include tiny T violations.) 
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The second comment concerns the "physical" origin of the position uncer- 
tainties, an important issue about which we have remained silent. The fact that 
particle masses are a manifestation of these uncertainties, <7q oc 1/m, might be 
a clue. Among the various approaches to quantum theory the version developed 
by Nelson, and known as stochastic mechanics [12j . is particularly attractive 
because it derives quantum theory from the hypothesis that particles in empty 
space are subject to a peculiar Brownian motion characterized by position fluc- 
tuations such that a 2 oc h/m. It is difficult to avoid the conclusion that the 
uncertainties underlying entropic dynamics might be explained by quantum ef- 
fects. However, while this is a very tempting possibility, an even more interesting 
and radical conjecture is that the explanatory arrow runs in the opposite di- 
rection. The radical conjecture would be that the same entropic dynamics that 
already explains mass, and interactions, and Newton's mechanics, might also — 
and with no further assumptions — explain quantum mechanics as well. Perhaps 
physics is nothing but inference after all. 
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